Confidence Estimation for Information Extraction
نویسندگان
چکیده
Information extraction techniques automatically create structured databases from unstructured data sources, such as the Web or newswire documents. Despite the successes of these systems, accuracy will always be imperfect. For many reasons, it is highly desirable to accurately estimate the confidence the system has in the correctness of each extracted field. The information extraction system we evaluate is based on a linear-chain conditional random field (CRF), a probabilistic model which has performed well on information extraction tasks because of its ability to capture arbitrary, overlapping features of the input in a Markov model. We implement several techniques to estimate the confidence of both extracted fields and entire multi-field records, obtaining an average precision of 98% for retrieving correct fields and 87% for multi-field records.
منابع مشابه
Bayes Interval Estimation on the Parameters of the Weibull Distribution for Complete and Censored Tests
A method for constructing confidence intervals on parameters of a continuous probability distribution is developed in this paper. The objective is to present a model for an uncertainty represented by parameters of a probability density function. As an application, confidence intervals for the two parameters of the Weibull distribution along with their joint confidence interval are derived. The...
متن کاملTowards Confidence Estimation for Typed Protein-Protein Relation Extraction
Systems which build on top of information extraction are typically challenged to extract knowledge that, while correct, is not yet well-known. We hypothesize that a good confidence measure for relational information has the property that such interesting information is found between information extracted with very high confidence and very low confidence. We discuss confidence estimation for the...
متن کاملImproved word confidence estimation using long range features
This paper describes experiments in improving word confidence estimation using documentand task-level features of the hypothesized word sequence from a recognizer. The improved confidence estimates are shown to improve information extraction performance, specifically named entity (NE) recognition. The detected names can then be used to further improve confidence estimation in a multi-pass NE re...
متن کاملInterval Estimation for the Exponential Distribution under Progressive Type-II Censored Step-Stress Accelerated Life-Testing Model Based on Fisher Information
This paper, determines the confidence interval using the Fisher information under progressive type-II censoring for the k-step exponential step-stress accelerated life testing. We study the performance of these confidence intervals. Finally an example is given to illustrate the proposed procedures.
متن کاملConfidence Estimation for Knowledge Base Population
Information extraction systems automatically extract structured information from machine-readable documents, such as newswire, web, and multimedia. Despite significant improvement, the performance is far from perfect. Hence, it is useful to accurately estimate confidence in the correctness of the extracted information. Using the Knowledge Base Population Slot Filling task as a case study, we pr...
متن کامل